CONTRAfold: RNA secondary structure prediction without physics-based models

نویسندگان

  • Chuong B. Do
  • Daniel A. Woods
  • Serafim Batzoglou
چکیده

MOTIVATION For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free grammars (SCFGs) have emerged as an alternative probabilistic methodology for modeling RNA structure. Unlike physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, SCFGs use fully-automated statistical learning algorithms to derive model parameters. Despite this advantage, however, probabilistic methods have not replaced free energy minimization methods as the tool of choice for secondary structure prediction, as the accuracies of the best current SCFGs have yet to match those of the best physics-based models. RESULTS In this paper, we present CONTRAfold, a novel secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring. In a series of cross-validation experiments, we show that grammar-based secondary structure prediction methods formulated as CLLMs consistently outperform their SCFG analogs. Furthermore, CONTRAfold, a CLLM incorporating most of the features found in typical thermodynamic models, achieves the highest single sequence prediction accuracies to date, outperforming currently available probabilistic and physics-based techniques. Our result thus closes the gap between probabilistic and thermodynamic models, demonstrating that statistical learning procedures provide an effective alternative to empirical measurement of thermodynamic parameters for RNA secondary structure prediction. AVAILABILITY Source code for CONTRAfold is available at http://contra.stanford.edu/contrafold/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RNA secondary structure prediction and runtime optimization

1. Background RNA secondary structure Pseudoknots Non-coding RNA 2. CONTRAfold: Probabilistic RNA folding Overview of the algorithm Details of the algorithm Performance of CONTRAfold 3. Other RNA folding methods: Physics-based models and Stochastic Context Free Grammars Physics-based models Stochastic Context Free Grammars Advantages of CONTRAfold over these other approaches 4. How RNA folding ...

متن کامل

A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using comp...

متن کامل

A max-margin model for efficient simultaneous alignment and folding of RNA sequences

MOTIVATION The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction. In this work, we present RAF (RNA Alignment and Folding), an efficient algorithm for ...

متن کامل

Recent Developments in RNA Secondary Structure Prediction

Here we discuss recent developments in predicting RNA secondary structure, giving an overview of methods used, including energy minimization and Boltzmann sampling, a probabilistic approach, and the use of fragment libraries generated from high resolution three-dimensional RNA crystal structures. An example for each of these three methods is discussed in detail (namely Sfold, CONTRAfold and MC-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 22 14  شماره 

صفحات  -

تاریخ انتشار 2006